متن انگلیسی داده کاوی(مقاله)
Dagstuhl seminar proposal „ Ontologies and Text Mining for Life Science“ 1/5
Ontologies and Text Mining for Life Sciences
Current Status and Future Perspectives
Dagstuhl, 25-28 March 2008
Executive Summary
Keywords: Text Mining, natural language processing, ontologies, ontology design,
machine learning, bioinformatics, medical informatics, knowledge management
1 Introduction
Researchers in Text Mining and researchers active in developing ontological resources
provide solutions to preserve semantic information properly, i.e. in ontologies
and/or fact databases. Researchers from both fields tend to work independently from
each other, but there is a shared interest to profit from ongoing research in the complementary
domain. The relatedness of both domains has led to the idea to organize
a workshop that brings together members of both research domains.
2 The gap between Text Mining and ontologies
Life Science researchers deliver their findings in scientific publications. These documents
are nowadays distributed electronically and increasingly processed by automatic
means to also incorporate those findings and the data into structured, scientific
databases. Methods for this purpose are generally subsumed under the term “Text
Mining”, encompassing techniques belonging to the fields of machine learning, information
retrieval and natural language processing. Text Mining-based solutions have,
for instance, been developed for the identification of protein-protein interactions, of
gene regulatory events, for the functional annotation of proteins, for the identification
and prioritization of disease-related genes, and for the analysis of results from highthroughput
experiments.
Text Mining for the Life Sciences has received considerable interest over the last
years and is now an established area for conferences and workshops (e.g., ISMB,
KDD, ECCB, Coling, ACL, PSB) and has lead to international large-scale challenge
events (KDD-Cup, Genomics track at TREC, BioCreative2&2, BioNLP). The cause
for this interest is the ever increasing amount of publications imposing an unbearable
work burden on the individual researcher and the promising advances in natural language
processing and machine learning that form the solution to the problem, if they
are integrated into biomedical applications.
Text Mining has to cope with a large semantic gap between the raw textual data and
the representation of meaningful results in databases, e.g., normalization of events in
the text to conceptual representations of events according to “textbook” knowledge. It
is hoped that ontologies fill this gap delivering a structured representation of biomedical
knowledge. Although large and increasingly comprehensive biological ontologies